This tutorial comes from Carson Sievert’s Plotly for R Master Class.

Case Study: housing sales in Texas

Loading plotly

The plotly package depends on ggplot2 which bundles a data set on monthly housing sales in Texan cities acquired from the TAMU real estate center. After the loading the package, the data is “lazily loaded”” into your session, so you may reference it by name:

library(plotly)
txhousing
## # A tibble: 8,602 × 9
##    city     year month sales   volume median listings inventory  date
##    <chr>   <int> <int> <dbl>    <dbl>  <dbl>    <dbl>     <dbl> <dbl>
##  1 Abilene  2000     1    72  5380000  71400      701       6.3 2000 
##  2 Abilene  2000     2    98  6505000  58700      746       6.6 2000.
##  3 Abilene  2000     3   130  9285000  58100      784       6.8 2000.
##  4 Abilene  2000     4    98  9730000  68600      785       6.9 2000.
##  5 Abilene  2000     5   141 10590000  67300      794       6.8 2000.
##  6 Abilene  2000     6   156 13910000  66900      780       6.6 2000.
##  7 Abilene  2000     7   152 12635000  73500      742       6.2 2000.
##  8 Abilene  2000     8   131 10710000  75000      765       6.4 2001.
##  9 Abilene  2000     9   104  7615000  64500      771       6.5 2001.
## 10 Abilene  2000    10   101  7040000  59300      764       6.6 2001.
## # ℹ 8,592 more rows

Good old ggplot2

Let’s see if there’s any pattern in house price behavior over time:

p <- txhousing %>%
  group_by(city) %>%
    ggplot(aes(x = date, y = median)) +
      geom_line(aes(group = city), alpha = 0.2)
p

Make it interactive

It’d be nice if we could see which city each line corresponds to when we hover. plotly makes this easy! Just wrap your ggplot object in the ggplotly() function:

class(p)
## [1] "gg"     "ggplot"
ggplotly(p)

If we just want the city name, we can specify exactly what to put in the tooltip:

ggplotly(p, tooltip = "city")

Tidying up with plot_ly()

We can also build plotly objects directly using the plot_ly() function along with dplyr-like syntax. Why would we want to? Well, for one thing, plot_ly() recognizes and preserves groupings created with dplyr’s group_by() function:

library(dplyr)
tx_grouped <- group_by(txhousing, city)

# initiate a plotly object with date on x and median on y
p <- plot_ly(tx_grouped, x = ~date, y = ~median)

plotly_data(p)
## # A tibble: 8,602 × 9
##    city     year month sales   volume median listings inventory  date
##    <chr>   <int> <int> <dbl>    <dbl>  <dbl>    <dbl>     <dbl> <dbl>
##  1 Abilene  2000     1    72  5380000  71400      701       6.3 2000 
##  2 Abilene  2000     2    98  6505000  58700      746       6.6 2000.
##  3 Abilene  2000     3   130  9285000  58100      784       6.8 2000.
##  4 Abilene  2000     4    98  9730000  68600      785       6.9 2000.
##  5 Abilene  2000     5   141 10590000  67300      794       6.8 2000.
##  6 Abilene  2000     6   156 13910000  66900      780       6.6 2000.
##  7 Abilene  2000     7   152 12635000  73500      742       6.2 2000.
##  8 Abilene  2000     8   131 10710000  75000      765       6.4 2001.
##  9 Abilene  2000     9   104  7615000  64500      771       6.5 2001.
## 10 Abilene  2000    10   101  7040000  59300      764       6.6 2001.
## # ℹ 8,592 more rows

Since we didn’t specify any mapping, the plot defaults to a scatterplot:

p

Let’s change that to a line chart. Similar to geom_line() in ggplot2, the add_lines() function connects (a group of) x/y pairs with lines in the order of their x values and returns the transformed plotly object:

p %>%
  add_lines(alpha = 0.2, name = "Texan Cities")

Highlighting

Want to highlight a particular line? Filtering works, and since each add_lines() call returns a pointer to the modified plotly object, we can chain calls together with pipes:

p <- txhousing %>%
  group_by(city) %>%
  plot_ly(x = ~date, y = ~median) %>%
  add_lines(alpha = 0.2, name = "Texan Cities", hoverinfo = "none") %>%
  filter(city == "Houston") %>%
  add_lines(name = "Houston")

Zooming with context

Want to zoom in without losing context? Try a rangeslider():

rangeslider(p)

Any ggplot will do…

And just so you don’t think we’re limited to line charts:

p2 <- txhousing %>%
  ggplot(aes(date, median)) + geom_bin2d()
p2 <- ggplotly(p2)
p2

Check out The Plotly Cookbook for more details on specific plotly visualization types (“traces”).

Your Turn!

Find a new data set to practice with and create at least 2 different interactive plots.